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Abstract 

We  study  an  approach  for  minimizing  a  convex  quadratic  function  subject  to  two 
quadratic  constraints.  This  problem  stems  from  computing  a  trust-region  step  for  an 
SQP  algorithm  proposed  by  Celis,  Dennis  and  Tapia  (1984)  for  equality  constrained 
optimization.  Our  approach  is  to  reformulate  the  problem  into  a  univariate  nonlinear 
equation  <f>(n)  =  0  where  the  function  <f>(p)  is  continuous,  at  least  piecewise  differen¬ 
tiable  and  monotone.  Well-established  methods  then  can  be  readily  applied.  We  also 
consider  an  extension  of  our  approach  to  a  class  of  non-convex  quadratic  functions 
and  show  that  our  approach  is  applicable  to  reduced  Hessian  SQP  algorithms.  Nu¬ 
merical  results  are  presented  indicating  that  our  algorithm  is  reliable,  robust  and  has 
the  potential  to  be  used  as  a  building  block  to  construct  trust-region  algorithms  for 
small-sized  problems  in  constrained  optimization. 

Keywords:  Constrained  optimization,  trust-region  algorithms,  CDT  problem,  univari¬ 
ate  nonlinear  equation,  Newton’s  method. 

Abbreviated  Title:  Computing  a  CDT  trust-region  step. 
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1  Introduction 


In  this  paper,  we  consider  solving  the  following  minimization  problem 

minimize  q(d )  =  gTd  +  \ d?Bd,  d  6  Rn  (1.1) 

subject  to  ||d||  <  6  (1.2) 

and  \\ATd  +  h\\<9,  (1.3) 

where  B  <E  Rnx"  is  symmetric,  A  £  RnXm  (m  <  n),  g  €  Rn,  h  e  Rm,  6  >  0,  0  >  0,  and 
throughout  this  paper  the  norm  ||  •  ||  denotes  the  t?,  norm.  In  order  to  have  a  meaningful 
feasible  set,  we  assume  that 

0  >  0min  =  min{||ATd-|-  A||  :  ||d||  <  <$}.  (1.4) 

Since  only  a  global  solution  is  of  interest  to  us,  the  term  “solution”  always  implies  a  global 
solution. 

The  above  problem  comes  from  applying  the  successive  quadratic  programming  (SQP) 
method  and  a  trust-region  technique  to  minimize  a  function  f(x)  subject  to  the  equality 
constraints  h(x)  =  0.  At  the  A;-th  iteration,  we  want  to  obtain  the  correction  step  dk  to 
the  current  iterate  xk  by  minimizing  a  quadratic  model  q(d)  =  gTd  +  (fBd/2  subject  to 
the  linearized  constraints  A^ d  T  h  =  0,  where  g  =  V  f(xk ),  B  is  the  Hessian  or  an  approx¬ 
imate  Hessian  of  the  Lagrangian  function  with  respect  to  x,  A  =  Vh(xk)  and  h  =  h(xk). 
Meanwhile,  we  also  want  to  impose  the  trust-region  restriction  ||d||  <  S.  The  linearized 
constraints  and  the  trust-region  restriction  are  not  necessarily  compatible  when  h  ^  0  (we 
assume  h  ^  0  in  this  paper),  so  in  order  to  guarantee  a  non-empty  feasible  set,  we  replace 
the  requirement  that  the  linearized  constraints  be  zero  by  a  condition  that  the  norm  of  the 
linearized  constraints  be  within  a  given  tolerance  level.  Eventually,  we  end  up  with  Problem 

(1.1) -(1.3).  This  approach  was  first  proposed  by  Celis,  Dennis  and  Tapia  [2]  and  later  it  was 
also  used  by  Powell  and  Yuan  [11]  in  their  algorithm.  For  brevity,  we  shall  call  Problem 

(1.1) -(1.3)  the  CDT  problem  in  this  paper. 

From  the  first-order  necessary  conditions  for  the  CDT  problem  (the  constraint  qualifi¬ 
cation  is  satisfied  as  is  pointed  out  by  Yuan  [14]  because  the  feasible  set  is  convex  with 


2 


nonempty  interior),  one  can  easily  deduce  that  a  solution  d*  to  the  CDT  problem  satisfies 

(B  +  p*AAT  +  \*I)d *  =  -(g  +  p*Ah),  (1.5) 

where  p*  >  0  and  A*  >  0  are  the  multipliers  of  Problem  (1.1)-(1.3).  The  vector  d*  is  of 
course  feasible.  In  addition,  one  has  the  two  complementarity  conditions 

A*  =  0  and  ||eT||  <  S  or  A*  >  0  and  ||cT||  =  6,  (1.6) 

fi*  =  0  and  \\ATd*  +  h\\  <  6  or  p*  >  0  and  \\ATd*  +  h\\  =  6.  (1.7) 

Without  the  constraint  (1.3),  Problem  (1.1 )-( 1 .2)  is  the  trust-region  subproblem  for  un¬ 
constrained  optimization  and  its  solution  is  well  understood.  The  following  theorem  charac¬ 
terizes  global  solutions  of  Problem  ( 1 . 1 )- ( 1.2) .  Is  was  proved  independently  by  Gay  [4]  and 
Sorensen  [13]. 

Theorem  1.1  (Gay,  Sorensen)  A  vector  d*  is  a  global  solution  to  Problem  (1.1)-(1.2)  if 
and  only  if  ||d*||  <  6  and  for  some  A*  >  0, 

(B  +  \*I)d*  =  -g,  A*(||d*||  —  6)  =  0,  (1.8) 

with  B  T  A  I  positive  semi-definite.  This  A*  is  unique.  Moreover ,  if  B  T  A*/  is  positive 
definite,  then  d*  is  the  unique  global  solution. 

Based  on  this  strong  necessary  and  sufficient  optimality  condition,  very  effective  Newton 
type  algorithms  have  been  constructed  for  obtaining  a  solution  or  an  approximate  solution 
of  Problem  (1 . 1)-(1 .2)  (see  [6]  for  further  references). 

Unfortunately,  with  the  presence  of  constraint  (1.3)  a  similar  necessary  and  sufficient 
optimality  condition  no  longer  exists  as  was  recently  shown  by  Yuan  (1987)  [14].  The  positive 
semi-definiteness  of  the  matrix  (B  A  fi*  AA^  -\- \*  I')  cannot  be  guaranteed  for  general  symmetric 
matrices  B.  Yuan  gave  examples  demonstrating  that  this  matrix  can  have  one  negative 
eigenvalue  when  p  and  A  are  unique  and  can  even  have  two  negative  eigenvalues  in  an 
unfavorable  situation.  The  lack  of  a  necessary  and  sufficient  optimality  condition  and  a 
positive  semi-definite  Hessian  of  the  Lagrangian  (with  respect  to  d)  makes  it  much  more 
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difficult  to  construct  effective  algorithms  for  solving  this  problem.  Nevertheless,  Yuan  gives 
sufficient  conditions  for  a  solution  under  the  assumption  that  the  Hessian  of  the  Lagrangian 
with  respect  to  d  is  positive  semi-definite  at  d*.  The  following  is  Yuan’s  Theorem  2.5  in  [14], 
with  a  uniqueness  result  added  by  us. 

Theorem  1.2  The  conditions  (1-5),  (1-6),  (1.7)  and  the  positive  semi-definiteness  of  the 
matrix  B  +  p*  AAT  +  A  *  I ,  where  fi*  and  A*  are  non-negative,  are  sufficient  for  a  feasible  d* 
to  be  a  solution  of  the  CDT  problem.  Moreover,  if  the  matrix  B  +  p*AAT  +  A*/  is  positive 
definite,  then  the  solution  d*  is  unique. 

Proof:  We  only  prove  the  last  statement  which  was  not  a  part  of  Yuan’s  original  theorem. 
Suppose  that  B  +  p*AAT  +  A*/  is  positive  definite  but  d *  is  not  unique.  Then  there  exists 
another  global  solution  d  such  that  q(d)  =  q(d*).  Since  B  +  p*AAT  +  A*/  is  positive  definite, 
d*  is  the  unique  minimizer  of  the  quadratic 

«M+ Yll^+Af  +  yMI*. 

Therefore, 

f\\ATd‘  +  hf  +  A-IMl2  <  i,'\\ATd+h\\2  +  A-HI2.  (1.9) 

Since  both  d*  and  d  are  global  solutions,  from  the  complementarity  conditions  (1.6)  and 
(1.7),  we  have 

p*\\ATd* +  h\\2  =  p*\\ATd+h\\2  =  p*d2  and  A*||<f||2  =  A*||d||2  =  \*62, 

which  contradicts  (1.9).  Hence  d*  must  be  unique.  □ 

While  the  problem  of  effectively  solving  the  £2-norm  CDT  problem  for  a  general  sym¬ 
metric  matrix  B  is  still  open,  some  algorithms  have  been  recently  proposed  for  other  norms 
or  for  the  ^^norm  but  in  less  general  cases.  For  example,  Celis  et  al  [1]  suggest  solving  a 
modified  CDT  problem  by  restricting  d  to  a  two-dimensional  subspace.  Yuan  [15]  proposes  a 
dual  algorithm  for  solving  the  CDT  problem  for  a  positive  definite  matrix  B.  His  approach 
is  to  solve  the  dual  problem:  maximize  the  Lagrangian  function  of  the  CDT  problem  subject 
to  the  constraints  that  (i)  its  gradient  with  respect  to  d  vanishes  and  (ii)  the  multipliers  are 
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non-negative.  Eliminating  the  first  constraint  by  substitution,  he  reduces  the  problem  to 
maximizing  a  quadratic  concave  function  of  two  variables  (multipliers)  in  the  first  quadrant. 
His  algorithm  basically  uses  Newton’s  method  with  a  line  search  but  a  projected  steepest 
descent  direction  is  used  whenever  Newton’s  direction  is  not  feasible.  Moreover,  special 
care  has  to  be  taken  near  the  boundary  to  ensure  that  the  iterates  stay  in  the  first  quad¬ 
rant.  To  evaluate  the  objective  function  and  its  Hessian,  the  Cholesky  factors  of  the  matrix 
B  +  fiAAT  +  XI  have  to  be  computed  and  stored. 

In  this  paper,  we  propose  a  new  approach  for  solving  the  CDT  problem.  Our  approach 
is  to  reformulate  the  problem  into  a  univariate  nonlinear  equation  <f>(n)  =  0  where  the 
function  </>(p)  is  continuous,  at  least  piecewise  differentiable  and  monotonically  decreasing. 
Well-established  methods  for  root  finding  then  can  be  readily  applied. 

This  paper  is  organized  as  follows.  In  Section  2,  we  show  how  the  CDT  problem  with 
a  convex  q(d )  can  be  reduced  to  a  well-behaved  univariate  nonlinear  equation  </>(p)  =  0.  In 
Section  3,  we  show  that  our  approach  can  be  extended  to  the  CDT  problem  with  a  class 
of  non-convex  q(d)  and  give  an  important  application.  Numerical  results  are  presented  in 
Section  4.  We  give  some  concluding  remarks  in  the  last  section. 

From  now  on,  we  shall  assume  that  B  is  positive  definite  unless  otherwise  specified.  We 
have  to  point  out  that  this  is  an  unsatisfactory  assumption  since  one  of  the  advantages  of 
trust-region  strategy  is  presumably  its  ability  to  handle  indefinite  Hessian  (or  approximate 
Hessian)  matrices.  Nevertheless,  when  a  BFGS  type  quasi-Newton  update  is  used  along  with 
measures  to  ensure  positive  definiteness,  this  assumption  on  B  is  still  reasonable. 


2  Problem  reformulation 


We  first  define  two  functions  of  the  variables  fi  >  0  and  A  >  0,  namely, 


H(n,  A)  =  B  +  iiAAt  +  XI  e  R"xn;  (2.1) 

d(»,  A)  =  X)~1(g  +  fiAh)  e  R".  (2.2) 
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Then,  we  define  three  functions  of  the  variable  d  £  Rn,  namely, 


y(d)  =  A(ATd  +  h)e  Rn; 

(2.3) 

m  =  j(M2-#2)eR; 

(2.4) 

m  =  l(MT<i+A||2-»2)6R. 

(2.5) 

It  is  evident  that  the  constraints  (1.2)  and  (1.3)  are  equivalent  to  'ip(d) 

O 

VI 

~a 

a 

o 

VI 

respectively. 

For  simplicity,  we  shall  adopt  the  following  notational  conventions.  Whenever  the  vector 
d(y,  A)  defined  by  (2.2)  is  substituted  into  the  above  three  functions  of  </,  we  shall  write 
them  as  functions  of  (//,  A)  without  changing  their  names.  For  example, 

y(n,  A)  =  y(d(y,  A))  =  A(ATd(y,  A)  +  h). 

Furthermore,  if  A  is  given  as  a  function  of  /i,  we  shall  write  all  the  above  five  functions  as 
functions  of  a  single  variable  y  without  changing  their  names.  For  instance, 

<j)(fi)  =  <j>(d(^X(n))).  (2.6) 

It  is  worth  noting  that  d  =  V d^{d)  and  y(d)  =  Vd<j)(d). 

Our  approach  for  solving  the  CDT  problem  is  motivated  by  the  following  observation. 
If  the  constraint  (1.3)  of  the  CDT  problem  is  active  (we  will  show  that  this  is  the  case  of 
interest)  and  if  we  can  define  A  as  a  function  of  y  such  that  \(y)  always  satisfies  the  feasibility 
condition 

IKMM)||<*  (2.7) 

and  the  first  complementarity  condition 

A(/0  =  0  or  ij>(y,  A(/0)  =  0,  (2.8) 

(we  will  see  that  there  is  a  natural  way  to  define  such  a  X(y)),  then  the  solution  of  the 
CDT  problem  reduces  to  solving  the  second  complementarity  condition  (1.7),  or  equivalently 
solving  the  univariate  nonlinear  equation 

M  =  0.  (2.9) 
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To  construct  the  desired  function  A (//),  we  notice  that  by  Theorem  1.1,  the  conditions 
(2.2),  (2.7)  and  (2.8)  are  the  necessary  and  sufficient  conditions  for  d(j t/,A)  and  A  to  be  the 
unique  solution  and  the  unique  Lagrange  multiplier,  respectively,  of  the  following  problem: 


minimize  q(d )  +  pf>(d),  d  E  Rn 

subject  to  Ml  <  6. 


(2.10) 


This  problem  is  in  the  standard  form  of  a  trust-region  subproblem  for  unconstrained 
optimization  with  its  solution  and  multiplier  depending  on  the  parameter  p.  We  will  refer  to 
Problem  (2.10)  as  V(p)  to  emphasize  its  dependence  on  the  parameter  p.  Obviously,  "P(O) 
is  Problem  (1.1)-(1.2)  and  it  is  natural  to  use  'P(oo)  to  denote  the  problem: 


minimize  4>(d),  d  6  R” 

subject  to  ||d||  <  6. 


(2.11) 


The  following  lemma  is  a  direct  consequence  of  Theorem  1.1  and  the  positive  definiteness 
assumption  on  B. 


Lemma  2.1  Let  A(/x)  be  the  multiplier  corresponding  to  the  constraint  in  V(p)  ,  then  X(p) 
is  a  well-defined  non-negative  function  of  p  for  p  >  0.  Specifically 


0,  t/’(p,0)<0, 

A(/i),  otherwise , 

where  A(/i)  >  0  is  the  implicit  function  defined  by  the  equation  tf(p,A) 

(2.7)  and  (2.8)  are  satisfied  by  X(p).  Moreover,  the  solution  ofV(p)  is 

d(p)  =  d{p,  \(p)) 

which  is  also  a  well-defined  function  of  p  >  0. 

Once  the  function  A (p)  is  defined  as  in  Lemma  2.1,  we  have  the  following  sufficient 
optimality  condition. 

Lemma  2.2  Let  A (p)  and  d(p)  be  as  in  Lemma  2.1,  and  f(p)  be  defined  by  (2.6).  If  p*  >  0 
is  such  that  f(p*)  =  0,  then  d(p*)  is  the  unique  solution  of  the  CDT  problem. 


(2.12) 

=  0.  Conditions 
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Proof:  This  is  straightforward  from  Theorem  1.2  since  d(p*)  is  obviously  feasible,  conditions 
(1.5),  (1.6)  and  (1.7)  are  all  satisfied  and  B  -f  p*AAT  +  X(p*)I  is  positive  definite.  □ 

We  next  show  that  the  functions  X(p),  d(p)  and  (j)(p )  are  well-behaved. 


Theorem  2.1  Let  A (p)  and  d(p)  be  as  in  Lemma  2.1,  and  4>(p)  be  defined  by  (2.6). 


1.  The  functions  X(p),  d(p)  and  <j)(p )  are  all  continuous  in  [0,+oo). 

2.  The  functions  X(p),  d(p)  and  <j>(p)  are  all  differentiable  in  [0,  +oo)  except  possibly  at 
zeros  of  if(p,  0)  which  are  all  isolated. 


3.  The  derivatives  of  X(p),  d(p)  and  <t>(p)  are: 

w  x  °>  0(/bO)  <  0, 

A  [p)  = 

-d(p)TH(p)-ly(p)ld(p)TH(p)-ld(p),  tf(p,  0)  >  0, 
d\p)  =  -H(p)~ly{p)  -  A  '(p)H(p)~ld(p),  if(p,  0)  ^  0, 

ffp)  =  -y(p)TH(p)~1y(p)  -  Xl(p)y{p)TH(pY1d(p),  tf(p,0)  ^  0. 


(2.13) 

(2.14) 

(2.15) 


4 .  The  function  f>(p)  is  monotonically  decreasing  in  [0,-foo). 


Proof:  From  the  definition  of  d( A)  and  <f>(p),  we  see  that  they  are  continuous  and  differen¬ 
tiable  if  X(p)  is  continuous  and  differentiable.  Hence  it  suffices  to  prove  the  continuity  and 
differentiability  of  X(p).  Now  let  us  assume  po  €  [0,oo). 

First,  we  note  that  the  function  tf(p,  0)  is  continuous.  If  if(po,  0)  <  0,  then  there  exists  a 
neighborhood  of  po  in  [0,  oo)  such  that  if(p,  0)  <  0  for  p  in  that  neighborhood.  By  definition 
(2.12),  A (p)  =  0  for  p  in  that  neighborhood.  Thus,  A (p)  is  continuous  and  differentiable  at 
Po ,  and  X'(po)  —  0  which  gives  the  expression  for  X'(p)  in  (2.13)  for  the  case  fi(p,  0)  <  0. 

Similarly,  if  fi(po)  >  0,  then  there  exists  a  neighborhood  of  p0  in  [0,  oo)  such  that 
if(p,0)  >  0  for  p  in  that  neighborhood.  By  (2.12),  A (p)  =  A (p)  for  p  in  that  neighborhood. 
In  view  of  Theorem  1.1,  there  exists  a  unique  A0  such  that  if(p0,  A0)  =  0.  Through  direct 
calculation,  we  have 

A )tH(^  AJ-'j/fc,  A), 
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and 


=  -d(n,\)T 1d(ft,\). 

The  latter  is  negative  at  (fi0,  A0)  since  iL(//0,A0)  is  positive  definite  and 

||c?(//o,  Ao)||  =  f>  >  0. 

Therefore  A(/i)  is  well-defined  and  differentiable  in  a  neighborhood  of  g0  by  the  well-known 
implicit  function  theorem  (see  [9],  for  example).  In  addition, 

_1  d^{n,X) 

dfj, 

which  gives  the  expression  for  A'(^)  in  (2.13)  in  the  case  ip(n,0)  >  0. 

Let  Ho  be  such  that  */>(^0, 0)  =  0  and  {Hj}^  be  any  sequence  that  converges  to  fio •  To 
prove  the  continuity  of  A (//)  at  /i0,  we  need  to  show  that 


AV)  =  - 


dip(g,X) 

dX 


lim  A (fij)  =  X(fio)  =  0. 

j—*oo 

Without  loss  of  generality,  we  assume  that  0)  >  0  for  all  j  >  1  (otherwise  X(fij)  =  0) 
which  implies 

M»j,KN))\\  =  \\d(Ho,  0)||  =  6 

for  all  j.  Let  j  go  to  infinity  and  let  Amax  =  limsup^^  X(fij).  We  have 

IM(/x0j  Amax)||  =  ||d(^O)0)||, 


or  equivalently, 


{g  + HoAh)T[H(fio,0)  2  -  H(g0,  Amax)  2}(g  +  g0Ah)  =  0.  (2.16) 

There  exists  an  orthogonal  matrix  Q  such  that 

H{go,0)  =  QT[diag(ai)^=1}Q 
where  cq  >0,  i  =  1,2,  ...,n.  Consequently 


H(ho,  Amax)  —  H(g  o,0)  -f-  A  max/  —  QT[diag(cri)™=1  +  XmaxI]Q. 
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It  follows  from  (2.16)  that 


n  /  i  1 

it  U,2  (°i  +  ^max)2 

where  pi  is  the  i-th  element  of  Q(g  +  Po Ah).  Since  the  p,’s  cannot  be  all  zero  otherwise 
d(g o,  0)  =  0  which  contradicts  tp(po,  0)  =  0,  so  we  must  have  at  least  one  index  i  (1  <  i  <  n) 
such  that 

_1_  _  1 

"h  ^max 

This  implies  Amax  =  0  and  leads  to 

lim  A (pj)  =  A (p0)  =  0, 

J— KX) 

recalling  that  Amax  =  limsup^^  A (pj)  and  A (p)  >  0.  Therefore,  A(/i)  is  continuous  at  go. 

So  far  we  have  proved  that  the  function  A (p)  is  continuous  in  [0,  oo)  and  also  differentiable 
except  possibly  at  points  where  ip(p,  0)  =  0.  Since  ip(p,  0)  is  a  non-constant  rational  function 
which  is  real  analytic  in  [0,oo),  all  its  zeros  have  to  be  isolated  by  the  well-known  theory 
of  analytic  functions.  Hence  A (p),  and  in  turn  d{p)  =  d(p,\(p))  and  <j>(p)  =  <f>(p,\(p)) 
which  are  differentiable  with  respect  to  both  p  >  0  and  A  >  0,  are  continuous  and  at  least 
piecewise  differentiable. 

Now  we  calculate  the  derivatives  for  d(p)  and  <f>(p )  when  tj>(p,  0)  ^  0.  Differentiating 
both  sides  of  the  equation 

( B  +  pAAr  +  \(p)I)d(p)  =  —(g  +  pAh) 

and  rearranging  the  terms,  we  have 

H(p)d'(p)  =  —y(p)  -  A \p)d{p) 

which  leads  to  the  expression  (2.14)  for  d'(p).  For  (f>(p ),  we  have 

<t>{p)  =  Vdm^))Td'{p)  =  y(p)Td\p) 

which  gives  (2.15). 
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Finally,  we  need  to  show  that  <f>(p)  is  monotonically  decreasing  in  [0,  oo).  It  suffices 
to  show  that  (j>'(p)  <  0  whenever  the  derivative  exists  because  <j>'(p)  is  at  least  piecewise 
differentiable.  Substituting  A'(^)  into  (2.15),  we  have 

<£'(/*)  =  -y{Y)TH(p)~l  y(p)  <  0 


when  ip(p,Q)  <  0.  For  ift(p,  0)  >  0, 


4>XY)  =  1y{p)  [  1  - 


(d(n)TH(fx)  1y{p)f 


d(0YH(p)-1d(p)y)p)TH(p)-1y(p)t 


<  0 


(2.17) 


by  the  Cauchy-Schwarz  inequality.  This  completes  the  proof.  □ 

Now  we  are  ready  to  reformulate  the  CDT  problem  into  a  well-behaved  univariate  non¬ 
linear  equation. 


Theorem  2.2  Let  the  function  (f>{p)  be  given  as  in  Theorem  2.1.  If  <f){ 0)  <  0,  then  d(0)  is 
the  solution  of  the  CDT  problem;  otherwise,  there  exists  a  p*  >  0  such  that  (f>(p*)  =  0  ( which 
implies  that  the  constraint  (1.3)  is  active).  The  vector  d(p*)  is  the  unique  solution  of  the 
CDT  problem. 


Proof:  Since  d( 0)  is  already  the  solution  of  Problem  (1.1)-(1.2),  if  </>(0)  <  0,  then  the 
constraint  (1.3)  is  also  satisfied  which  implies  that  d( 0)  is  the  solution  of  the  CDT  problem. 

Now  let  us  assume  <£(0)  >  0  and  let  d(oo)  denote  a  solution  of  V(oo)  (i.e.,  Problem  2.11). 
In  view  of  the  definitions  of  6^  and  <j>(p)  (see  (1.4)  and  (2.6)),  we  have 

<Kd( oo))  =  (Cm  -  92)/2  <  0. 

We  first  show  <f>(p)  <  0  for  p  large  enough.  Suppose  that  this  not  true,  i.e.,  <j>(p)  >  0  for  all 
p.  Since  d(p)  is  the  solution  of  V{p)  ,  we  have 

q(d(p))  +  p<f>(p)  <  q(d( oo))  +  p<f>(d( oo)). 

It  then  follows  that 


q(d(p))  -  q(d( oo))  <  /^(d(oo))  =  p(0Cn  ~  02)/2. 
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The  right-hand  side  tends  to  —  oo  as  p,  goes  to  -fioo.  The  left-hand  side,  however,  is  bounded 
below  by  ^(d(0))  —  q{d( oo)).  This  is  a  contradiction.  Therefore,  (f>(p)  <  0  for  p  large  enough. 
In  fact,  <f(p )  <  0  for  p  >  p  where 


H  = 


2[q(d(oo))  -  g(d(0))] 


02  -  02- 
vmin 


(2.18) 


Since  <^>(0)  >  0,  (f>{p)  <  0  for  some  p  >  0,  and  <f(p)  is  continuous  and  monotonically 
decreasing  by  Theorem  2.1,  there  must  exist  a  p*  >  0  such  that  (j>(p*)  =  0.  Finally,  d(p*)  is 
the  solution  of  the  CDT  problem  by  Lemma  2.2.  □ 

We  note  that  because  of  its  monotonically  decreasing  property,  fi{p)  cannot  have  more 
than  one  isolated  zero.  But  it  could  have  more  than  one  zero.  In  that  situation,  there  must 
exist  a  unique  interval  in  which  <f>(p)  is  identically  zero.  This  happens  only  if  both  constraints 
are  active  and  d*  and  y(d*)  are  linearly  dependent  as  shown  by  Yuan  [14].  From  (2.17),  we 
see  that  <j>'(p)  =  0  whenever  d(p)  and  y(p)  are  linearly  dependent. 

It  is  of  interest  to  interpret  our  formulation  geometrically.  Clearly,  d(p)  is  a  continuous 
curve  in  Rn.  The  solution  d(p*)  is  the  point  on  the  curve  where  the  curve  intersects  the 
surface  of  </>(d)  =  0  which  is  in  general  an  elliptical  cylinder  because  the  rank  of  A  is  generally 
less  than  n  for  equality  constrained  optimization  problems. 


3  Extension  to  non-convex  quadratics 

Since  the  second-order  sufficient  condition  for  equality  constrained  optimization  only  requires 
the  Hessian  of  Lagrangian  with  respect  to  x  to  be  positive  definite  on  the  null  space  of 
constraint  gradients,  the  positive  definiteness  requirement  for  B  is  often  (and  rightly)  viewed 
as  too  strong.  It  is  therefore  desirable  to  extend  our  formulation  to  the  CDT  problem  with 
non-convex  q(d),  namely,  B  may  be  indefinite.  The  following  theorem  provides  us  such  an 
extension. 

Theorem  3.1  Let  the  following  two  conditions  hold: 

1.  The  matrix  B  £  Ryxn  ?-n  qdj'  problem  is  positive  definite  on  the  null  space  of  AT , 
i.e.,  pT Bp  >  0  for  all  non-zero  p  6  Rn  such  that  ATp  =  0. 
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2.  There  exists  a  p  G  [0,^*),  where  p*  is  the  multiplier  p*  associated  with  the  constraint 
(1-3),  such  that  B  +  pAAT  is  positive  semi-definite. 

Then  the  functions  X(p),  d(p)  and  cf>(p)  can  be  defined  for  p  >  p  as  in  Theorem  2.1 
and  they  are  all  continuous  and  at  least  piecewise  differentiable  in  (/t,oo).  Moreover ,  <f>(p) 
is  monotonically  decreasing  in  (p,o o).  If  <f>(p)  >  0,  then  there  exist  a  p*  >  p  such  that 
<f>(p*)  =  0  and  the  solution  of  the  CDT  problem  is  d(p*). 

Proof:  We  omit  the  proof  because  it  is  analogous  to  the  proof  of  Theorem  2.2  noting  that 
H(p,  A)  is  positive  definite  for  all  p  in  (p,  oo)  and  all  A  in  [0,  oo).  □ 

Although  the  second  condition  in  the  theorem  seems  to  be  difficult  to  verify,  this  the¬ 
orem  does  have  an  important  application.  It  is  well-known  that  in  SQP  type  algorithms 
for  constrained  optimization,  the  reduced  Hessian  on  the  null  space  of  the  active  constraint 
gradients  is  the  essential  piece  of  the  second-order  derivative  information  for  fast  local  con¬ 
vergence  (see  [10]  for  an  explanation).  It  is  a  very  popular  approach  to  just  use  the  reduced 
Hessian  to  generate  SQP  steps  (see  [3]  and  [8],  to  cite  a  few  examples).  The  advantages  are: 
(i)  a  smaller  matrix  is  stored  and  handled  and  (ii)  near  a  solution  the  reduced  Hessian  is 
usually  positive  definite.  Let  us  assume  now  that  a  reduced  Hessian  approach  is  used  in  the 
CDT  problem  where  A  is  of  full  rank,  then  we  have  B  =  ZMZT,  where  M  G  RmXTn  is  an 
approximation  to  the  reduced  Hessian  and  is  positive  definite  and  Z  G  RnX(n-Tn)  is  a  basis 
for  the  null  space  of  AT  (assuming  that  A  is  of  full  rank).  It  is  easy  to  verify  that  such  a 
matrix  B  satisfies  the  two  conditions  in  Theorem  3.1  with  p  =  0.  So  if  d(0)  is  not  a  solution, 
then  the  CDT  problem  can  be  reduced  to  the  zero  finding  problem  <j>(p)  =  0. 

Unfortunately,  it  is  not  possible  to  directly  extend  our  approach  to  general  symmetric 
matrices  B.  This  is  because  in  our  approach,  H(p,  A)  is  always  kept  positive  definite,  if 
H(p*i  A*)  is  in  fact  indefinite,  as  it  may  be,  and  if  d*  and  y(d*)  are  linearly  independent 
which  implies  that  (p*,  A*)  is  unique,  then  there  is  no  way  that  the  necessary  condition  (1.5) 
can  be  satisfied  by  a  positive  definite  matrix  H(p,  A). 
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4  Numerical  Tests 


We  have  reformulated  the  CDT  problem  with  a  convex  quadratic  objective  function  q(d )  into 
a  problem  of  zero-finding  for  an  at  least  piecewise  differentiable  and  monotonically  decreasing 
function  The  zero-finding  problem  for  univariate  functions  is  perhaps  among  the  oldest 

problems  considered  in  numerical  analysis.  There  are  many  well-established  methods  for 
solving  this  problem  with  guaxanteed  convergence. 

We  have  found  that  instead  of  solving  =  0,  it  is  generally  easier  to  solve  the  equiv¬ 
alent  equation  <£(/*)  =  0,  where 

*(/i)  =  || +  A||  "  T  (4J) 

The  function  $(/x)  is  still  a  monotone  (this  time  increasing)  function  with  the  same 
smoothness  properties  as  <j>(n).  It  is  easy  to  verify  that  $'(/x)  =  —  <j>'(fj,)/\\ATd(n)  +  /j||3.  The 
nice  thing  about  $(//)  is  that  it  usually  exhibits  a  lower  degree  of  nonlinearity  than  <j>([i) 
does.  Figure  1  shows  the  behavior  of  the  two  functions  for  a  random  problem.  It  has  been 
observed  from  numerous  examples  that  this  kind  of  phenomenon  is  not  atypical. 

Figure  1:  Functions  $  (solid  line)  and  <j>  (dashed  line)  vs.  Variable  /x 
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4.1  An  Implementation  of  Newton’s  method 

We  have  implemented  an  algorithm  for  solving  $(fi)  =  0,  which  is  basically  Newton’s  method 
with  a  form  of  safeguarding,  as  given  below. 

Algorithm  1  (Solving  $(p)  =  0)  Given  tolerance  t1,  set  p  =  0,  pe  =  0,  pT  —  oo. 

Step  1  Solve  V(p)  to  obtain  d(p)  and  evaluate  $(//). 

Step  2  If  p  —  0  and  $(p)  >  0,  or  |||v4Td(/x)  +  /i||  —  9\/0  <  T\,  let  d*  =  d(p)  and  exit. 

Step  3  If$(p)  <  0,  then  pe  —  p  and  $e  =  $(/^);  e/se  pr  =  p  and  $T  =  $(fi). 

Step  4  Compute  the  Newton  step:  p,  :=  p  —  $(p)/<$>'(p). 

Step  h  If  p  £  (pe,pr),  use  the  Regula  Falsi  step:  p  :=  pe  —  $e(Pr  ~  9i)l{$r  —  ^)- 
Step  6  Go  to  Step  1. 


A  cause  of  concern  about  the  above  algorithm  is  probably  the  need  to  solve  the  standard 
trust-region  subproblem  V(p)  which  may  be  considered  to  be  expensive.  The  solution  to 
V{p)  is  now  well-understood  and  very  efficient  methods  have  been  developed  during  the  last 
decade  (see  [6]  and  [7],  for  example),  especially  for  positive  definite  matrices  B.  The  method 
perhaps  most  often  used  is  to  solve  the  equation 


||d(/i,A)|| 


-7  =  0, 


for  A,  while  p  is  fixed,  using  Newton’s  method  which  was  proposed  by  Reinsch  [12]  and 
Hebden  [5]  independently.  A  basic  algorithm  for  solving  the  above  equation  can  be  found  in 
[7,  p.p.47],  which  involves  the  Cholesky  factorization  of  H(p,\).  In  our  case,  since  H(p,  A) 
is  positive  definite,  the  success  of  this  algorithm  is  guaranteed.  In  practice,  it  has  been 
reported  [6]  that  on  the  average  less  than  two  iterations  (factorizations)  are  needed  to  obtain 
an  approximate  solution  to  A (//).  Since  there  is  no  need  for  a  line  search  in  our  algorithm 
and  no  need  for  special  measures  to  maintain  feasibility  as  are  required  by  Yuan’s  algorithm 
[15],  the  need  for  solving  V(p)  does  not  seem  to  be  a  formidable  overhead. 
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We  have  used  Newton’s  method  to  solve  the  problem  V(p)  (i.e.,  ^(p,  A)  =  0  for  a  fixed 
p)  as  given  in  More  and  Sorensen’s  paper  [7,  p.p.47],  but  we  have  added  the  restriction  p  >  0 
and  used  a  practical  stopping  criterion.  The  algorithm  is  as  follows. 

Algorithm  2  (Solving  ^(p,  A)  =  0  for  a  fixed  p)  Given  p  and  tolerance  T2,  set  A. 

Step  1  Let  H  =  B  +  pAAT  +  XI  and  solve  Hd  —  —g  for  d. 

Step  2  If  A  =  0  and  ||d||  <  6,  or  |||d[|  —  8\/8  <  T2,  then  let  d*  =  d  and  exit. 

Step  3  Compute  the  restricted  Newton  step: 


A  :=  max 


(o,  A  + 


t Fd  ||d||-A 
< FH-'d  8  )  ' 


Step  4  Go  to  Step  1. 


Interested  readers  are  referred  to  More  and  Sorensen’s  paper  [7]  for  more  details  about 
the  trust-region  subproblem  for  unconstrained  optimization. 

We  note  from  (2.15)  that  in  order  to  compute  the  derivative  $'(//),  the  vectors  H(p)~xd(p) 
and  H(p)~1y(p)  are  needed,  which  can  be  readily  computed  if  the  Cholesky  factor  of  H(p) 
is  stored  while  solving  V(p)  .  In  fact,  the  first  vector  is  already  available  after  solving  V(p)  . 

When  both  the  constraints  (1.2)  and  (1.3)  are  binding,  our  algorithm  can  be  roughly 
summarized  as  follows.  By  the  complementarity,  CDT  problem  reduces  to  the  nonlinear 
system  of  two  equations  with  two  variables 

*(M)  =  o, 

$(//,A)  =  0. 

We  eliminate  the  variable  A  by  solving  the  first  equation  while  fixing  p.  After  substitution, 
we  reduce  the  above  system  into  a  single  equation  $(p,  A (p))  =  0.  Fortunately,  both  the 
first  equation  (for  fixed  p)  and  the  second  equations  (after  the  substitution  A  =  A(^))  are 
well-behaved  monotone  functions  and  easy  to  solve. 

Algorithm  1  and  2  have  been  programmed  in  Matlab  and  run  on  a  SUN  3/160  workstation 
network  at  Rice  University  with  a  machine  epsilon  about  2.22  x  10-16. 
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4.2  Results  for  Yuan’s  Problems 


In  his  paper  [15],  Yuan  used  five  small  test  problems  in  testing  his  algorithm  (see  his  paper 
for  details).  We  ran  our  algorithm  on  these  five  problems,  called  Y1  through  Y5.  We  chose 
the  stopping  tolerances  as  T\  =  10-3  and  =  O.It!,  respectively,  for  Algorithm  1  and  2.  The 
final  solutions  we  obtained  have  at  least  three  and  often  four  significant  digits  in  agreement 
with  Yuan’s  solutions.  We  tabulate  the  number  of  Cholesky  factorizations  of  H(g,  A),  which 
is  the  leading  work  in  both  ours  and  Yuan’s  algorithms,  in  Table  1.  Because  ours  and  Yuan’s 
results  were  obtained  on  different  machines  with  different  stopping  criteria,  Table  1  should 
not  be  considered  as  a  rigorous  comparison,  but  rather  a  reasonable  indicator. 

Table  1:  Number  of  Factorizations  for  Yuan’s  Problems 


Test  Problem 

Y1 

Y2 

Y3 

Y4 

Y5 

Yuan’s  Algorithm 

8 

6 

11 

14 

11 

Our  Algorithm 

D 

2 

2 

15 

10 

From  the  table,  we  can  see  that  for  Problems  Y2  and  Y3  for  which  only  one  of  the 
two  constraints  is  active  (i.e.,  either  g  =  0  or  A  =  0),  our  algorithm  appears  to  be  much 
faster  than  Yuan’s.  This  probably  should  not  be  too  much  of  a  surprise,  because  unlike 
Yuan’s  algorithm,  which  searches  in  both  //  and  A  directions  simultaneously,  our  algorithm 
searches  along  the  two  directions  alternatively.  For  the  remaining  three  problems  for  which 
both  constraints  are  active,  our  algorithm  seems  to  be  comparable  with  Yuan’s  algorithm  in 
terms  of  number  of  matrix  factorizations. 

4.3  Results  for  Random  Problems 

Randomly  generated  problems  are  also  used  in  our  preliminary  numerical  experiments.  We 
use  the  Matlab  M-file  “rand”  to  generate  normally  distributed  random  numbers  to  form 
the  needed  matrices  and  vectors:  A,B,g  and  h.  To  ensure  the  positive  definiteness  of  B , 
we  first  generate  a  n  x  n  matrix  C  and  then  let  B  =  CTC.  The  trust-region  radius  6  is 
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also  generated  randomly  but  the  absolute  value  is  taken  to  ensure  positivity.  To  ensure  the 
constant  9  >  9,n\n  (see  (1.4)),  we  let  9  be  the  best  decrease  on  \\ATd  +  h\\  in  the  steepest 
descent  direction  within  the  trust-region  ||d||  <  6,  as  was  suggested  by  Celis,  Dennis  and 
Tapia  [2].  More  specifically, 

9  =  ||  AT{-ctAh)  +  h\\ 

where 

/  S  hT(ATA)h\ 
a  ~  mm  \JXh\y  hT(ATA)2h )  ‘ 

We  mention  that  another  existing  way  of  choosing  9  is 

9  =  min{||ATd  +  h\\  :  ||d||  <  r8} 

for  some  r  <  1,  proposed  by  Powell  and  Yuan  [11]. 

Our  test  results  on  five  random  problems,  R1  to  R5,  with  various  sizes  are  presented  in 
Table  2.  To  see  how  long  a  step  is  and  how  much  reduction  in  the  objective  function  as  well 
as  in  the  linearized  constraint  is  achieved  ,  we  include  in  the  table  the  four  quantities  q(d*) 
(note  <?(0)  =  0),  ||d*||,  \\h\\  and  +  h\\  where  d*  is  the  computed  solution.  Listed  in 

the  table  are  also  the  number  of  outer  iterations  and  the  number  of  Cholesky  factorization 
Nfac  needed  for  Algorithm  1  to  reach  a  solution  for  each  problem.  We  note  that  the  number 
Nfac  is  also  equal  to  the  accumulated  total  number  of  inner  iterations  taken  by  Algorithm  2 
embedded  in  Algorithm  1. 


Table  2:  Results  for  Random  Problems 


Problem 

n  :  m 

q(d*) 

M 

\\h\\ 

||  ATd*  +  h\\ 

Iter 

Nfac 

R1 

4  :  2 

-0.02352 

0.01449 

0.30307 

0.29380 

2 

6 

R2 

8  :  4 

-1.05896 

1.05458 

2.52440 

1.85602 

4 

12 

R3 

12  :  6 

-0.28778 

0.11656 

2.20569 

2.11969 

3 

9 

R4 

16  :  8 

-1.74549 

0.43507 

2.67182 

2.33413 

3 

9 

R5 

20  :  10 

-1.26240 

0.81427 

3.04088 

2.32549 

3 

10 
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Many  more  experiments  with  random  problems  have  been  done.  The  set  of  problems 
presented  in  Table  2  is  just  a  small  sample.  The  choice  of  this  sample  is  such  that  both 
constraints  (1.2)  and  (1.3)  are  binding  but  otherwise  is  arbitrary.  We  have  found  that  it 
is  often  easier  to  solve  problems  with  only  one  constraint  binding  instead  of  two.  We  also 
have  observed  from  our  computational  experiences  that  the  extent  of  difficulty  in  solving  a 
CDT  problem  by  our  algorithm  is  mainly  dictated  by,  aside  from  the  conditioning  of  involved 
matrices,  the  values  of  the  constants  9  and  6  and  their  relation. 

4.4  Accuracy  vs.  Computational  work 

From  the  nature  of  the  trust-region  strategy  as  a  device  of  enforcing  global  convergence  and 
also  from  theoretical  results  developed  for  trust-region  algorithms  in  unconstrained  optimiza¬ 
tion,  it  is  safe  to  say  that  in  general  one  need  not  solve  the  CDT  problem  to  a  high  accuracy. 
A  reasonably  good  approximate  solution  is  all  we  are  asking  for  at  least  from  the  practical 
point  of  view.  Therefore,  we  believe  that  our  stopping  criteria  can  be  further  relaxed  without 
affecting  the  practical  performance  of  the  trust-region  algorithm.  In  doing  so,  the  effort  re¬ 
quired  to  obtain  a  satisfactory  step  can  generally  be  significantly  reduced.  To  illustrate  this, 
we  ran  our  algorithm  on  a  random  problem  for  stopping  tolerances  T\  =  10— 3 , 10~2, 10“ 1  and 
r2  =  O.Iti.  The  data  for  this  problem  are  n  =  10,  m  =  5,  8  =  1,  0  =  3  and  ||/j||  =  3.704085. 
The  relevant  quantities  obtained  from  the  algorithm  at  termination  are  given  in  Table  3, 
where  again  Njac  denotes  the  number  of  Cholesky  factorization. 


Table  3:  Effects  of  Stopping  Tolerances 


Tolerance 

q(d ) 

Ml 

\\ATd+h\\ 

9- 

A 

Nfa  c 

n  =  lo-3 

-1.17216 

1.00000 

3.00003 

0.52841 

0.05215 

12 

n  =  10"2 

-1.15070 

1.00019 

2.98652 

0.53393 

0.05168 

10 

n  =  io-1 

-0.84760 

1.00042 

2.80405 

0.61749 

0.07847 

6 
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5  Concluding  remarks 


Compared  with  Yuan’s  algorithm  [15],  the  only  existing  one  for  computing  an  £2-norm  CDT 
trust-region  step  in  all  of  Rn  that  we  are  currently  aware  of,  our  algorithm  is  conceptually 
simpler  because  zero  finding  for  a  monotone  function  of  a  single  variable  is  better  understood 
than  locating  a  maximizer  in  the  first  quadrant  of  a  concave  function  of  two  variables.  The 
limited  numerical  experiments  do  seem  to  confirm  that  our  algorithm  is  indeed  reliable  and 
robust  as  expected. 

As  far  as  efficiency  is  concerned,  we  believe  that  methods  of  any  kind,  including  ours,  for 
solving  the  n-dimensional  t^-norm  CDT  problem  are  not  likely  to  be  cheap  in  the  context 
of  computing  an  iterative  step  for  a  trust-region  algorithm  for  which  the  CDT  problem  is 
merely  a  subproblem  that  has  to  be  solved  at  each  iteration.  On  the  other  hand,  solving  the 
CDT  problem  supposedly  produces  better  iterative  steps  and  therefore  enhences  robustness 
of  global  convergence.  From  our  computational  experiments,  we  have  been  led  to  believe 
that  our  approach  can  produce  good  steps  for  small-sized  problems  (say,  n  <  20)  at  very 
affordable  costs,  given  the  fact  that  high  accuracy  in  computed  steps  is  not  needed  in  trust- 
region  algorithms. 

To  summarize,  we  conclude  that  our  method  has  the  potential  to  be  used  as  a  build¬ 
ing  block,  along  with  a  BFGS-type  (reduced  or  full)  Hessian  approximation  technique,  to 
construct  reliable  and  robust  trust-region  algorithms  for  small-sized  problems  in  constrained 
optimization. 
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